AITopics | split mnist

Collaborating Authors

split mnist

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

15825aee15eb335cc13f9b559f166ee8-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 05:01:41 GMT

We are not certain we understood this criticism correctly. We use a diversity penalty (L113-115) in Generative MIR. In ER-MIR, diversity is enforced via sampling prior to applying the criterion (L102-104). We now extend our ER-MIR experiments to Mini-ImageNet split. Over 20 runs we obtain an accuracy of 26.4% We emphasize our work's aim was to determine if the In terms of memory consumption it is the same as ER with equivalent buffer.

artificial intelligence, baseline, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

Task-Focused Consolidation with Spaced Recall: Making Neural Networks Learn like College Students

Bamnodkar, Prital

arXiv.org Artificial IntelligenceSep-16-2025

Deep neural networks often suffer from a critical limitation known as catastrophic forgetting, where performance on past tasks degrades after learning new ones. This paper introduces a novel continual learning approach inspired by human learning strategies like Active Recall, Deliberate Practice, and Spaced Repetition, named Task-Focused Consolidation with Spaced Recall (TFC-SR). TFC-SR enhances the standard experience replay framework with a mechanism we term the Active Recall Probe. It is a periodic, task-aware evaluation of the model's memory that stabilizes the representations of past knowledge. We test TFC-SR on the Split MNIST and the Split CIFAR-100 benchmarks against leading regularization-based and replay-based baselines. Our results show that TFC-SR performs significantly better than these methods. For instance, on the Split CIFAR-100, it achieves a final accuracy of 13.17% compared to Standard Experience Replay's 7.40%. We demonstrate that this advantage comes from the stabilizing effect of the probe itself, and not from the difference in replay volume. Additionally, we analyze the trade-off between memory size and performance and show that while TFC-SR performs better in memory-constrained environments, higher replay volume is still more effective when available memory is abundant. We conclude that TFC-SR is a robust and efficient approach, highlighting the importance of integrating active memory retrieval mechanisms into continual learning systems.

artificial intelligence, machine learning, tfc-sr, (16 more...)

arXiv.org Artificial Intelligence

2507.21109

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Higher Education (0.76)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

95c6ae3f3393786203a4b6dcb9df1036-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 02:58:17 GMT

artificial intelligence, machine learning, split mnist, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom (0.04)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sequential Function-Space Variational Inference via Gaussian Mixture Approximation

Zhu, Menghao Waiyan William, Hao, Pengcheng, Kuruoğlu, Ercan Engin

arXiv.org Machine LearningMar-10-2025

Continual learning is learning from a sequence of tasks with the aim of learning new tasks without forgetting old tasks. Sequential function-space variational inference (SFSVI) is a continual learning method based on variational inference which uses a Gaussian variational distribution to approximate the distribution of the outputs of a finite number of selected inducing points. Since the posterior distribution of a neural network is multi-modal, a Gaussian distribution could only match one mode of the posterior distribution, and a Gaussian mixture distribution could be used to better approximate the posterior distribution. We propose an SFSVI method which uses a Gaussian mixture variational distribution. We also compare different types of variational inference methods with and without a fixed pre-trained feature extractor. We find that in terms of final average accuracy, Gaussian mixture methods perform better than Gaussian methods and likelihood-focused methods perform better than prior-focused methods.

continual learning, learning, task sequence, (13 more...)

arXiv.org Machine Learning

2503.07114

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States (0.04)
Europe > France (0.04)
Europe > Austria > Vienna (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

On the Computation of the Fisher Information in Continual Learning

van de Ven, Gido M.

arXiv.org Machine LearningFeb-17-2025

Continual learning is a rapidly growing subfield of deep learning devoted to enabling neural networks to incrementally learn new tasks, domains or classes while not forgetting previously learned ones. Such continual learning is crucial for addressing real-world problems where data are constantly changing, such as in healthcare, autonomous driving or robotics. Unfortunately, continual learning is challenging for deep neural networks, mainly due to their tendency to forget previously acquired skills when learning something new. Elastic Weight Consolidation (EWC) [1], developed by Kirkpatrick and colleagues from DeepMind, is one of the most popular methods for continual learning with deep neural networks. To this day, this method is featured as a baseline in a large proportion of continual learning studies. However, in the original paper the exact implementation of EWC was not well described, and no official code was provided. A previous blog post by Huszár [2] already addressed an issue relating to how EWC should behave when there are more than two tasks.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2502.11756

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Asia > China (0.04)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.48)
Education (0.47)
Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Task agnostic continual learning with Pairwise layer architecture

Keskinen, Santtu

arXiv.org Artificial IntelligenceMay-22-2024

Most of the dominant approaches to continual learning are based on either memory replay, parameter isolation, or regularization techniques that require task boundaries to calculate task statistics. We propose a static architecture-based method that doesn't use any of these. We show that we can improve the continual learning performance by replacing the final layer of our networks with our pairwise interaction layer. The pairwise interaction layer uses sparse representations from a Winner-take-all style activation function to find the relevant correlations in the hidden layer representations. The networks using this architecture show competitive performance in MNIST and FashionMNIST-based continual image classification experiments. We demonstrate this in an online streaming continual learning setup where the learning system cannot access task labels or boundaries.

architecture, continual learning, experiment, (13 more...)

arXiv.org Artificial Intelligence

2405.13632

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Learn the Time to Learn: Replay Scheduling in Continual Learning

Klasson, Marcus, Kjellström, Hedvig, Zhang, Cheng

arXiv.org Artificial IntelligenceNov-20-2023

Replay methods are known to be successful at mitigating catastrophic forgetting in continual learning scenarios despite having limited access to historical data. However, storing historical data is cheap in many real-world settings, yet replaying all historical data is often prohibited due to processing time constraints. In such settings, we propose that continual learning systems should learn the time to learn and schedule which tasks to replay at different time steps. We first demonstrate the benefits of our proposal by using Monte Carlo tree search to find a proper replay schedule, and show that the found replay schedules can outperform fixed scheduling policies when combined with various replay methods in different continual learning settings. Additionally, we propose a framework for learning replay scheduling policies with reinforcement learning. We show that the learned policies can generalize better in new continual learning scenarios compared to equally replaying all seen tasks, without added computational cost. Our study reveals the importance of learning the time to learn in continual learning, which brings current research closer to real-world needs.

dataset, experiment, test env, (13 more...)

arXiv.org Artificial Intelligence

2209.0866

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry:

Health & Medicine (0.92)
Education > Educational Setting > Online (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

TAME: Task Agnostic Continual Learning using Multiple Experts

Zhu, Haoran, Majzoubi, Maryam, Jain, Arihant, Choromanska, Anna

arXiv.org Artificial IntelligenceOct-7-2022

The goal of lifelong learning is to continuously learn from non-stationary distributions, where the non-stationarity is typically imposed by a sequence of distinct tasks. Prior works have mostly considered idealistic settings, where the identity of tasks is known at least at training. In this paper we focus on a fundamentally harder, so-called task-agnostic setting where the task identities are not known and the learning machine needs to infer them from the observations. Our algorithm, which we call TAME (Task-Agnostic continual learning using Multiple Experts), automatically detects the shift in data distributions and switches between task expert networks in an online manner. At training, the strategy for switching between tasks hinges on an extremely simple observation that for each new coming task there occurs a statistically-significant deviation in the value of the loss function that marks the onset of this new task. At inference, the switching between experts is governed by the selector network that forwards the test sample to its relevant expert network. The selector network is trained on a small subset of data drawn uniformly at random. We control the growth of the task expert networks as well as selector network by employing online pruning. Our experimental results show the efficacy of our approach on benchmark continual learning data sets, outperforming the previous task-agnostic methods and even the techniques that admit task identities at both training and testing, while at the same time using a comparable model size.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2210.03869

Country: North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Education > Educational Setting > Continuing Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning an evolved mixture model for task-free continual learning

Ye, Fei, Bors, Adrian G.

arXiv.org Artificial IntelligenceJul-11-2022

Recently, continual learning (CL) has gained significant interest because it enables deep learning models to acquire new knowledge without forgetting previously learnt information. However, most existing works require knowing the task identities and boundaries, which is not realistic in a real context. In this paper, we address a more challenging and realistic setting in CL, namely the Task-Free Continual Learning (TFCL) in which a model is trained on non-stationary data streams with no explicit task information. To address TFCL, we introduce an evolved mixture model whose network architecture is dynamically expanded to adapt to the data distribution shift. We implement this expansion mechanism by evaluating the probability distance between the knowledge stored in each mixture model component and the current memory buffer using the Hilbert Schmidt Independence Criterion (HSIC). We further introduce two simple dropout mechanisms to selectively remove stored examples in order to avoid memory overload while preserving memory diversity. Empirical results demonstrate that the proposed approach achieves excellent performance.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2207.0508

Country: Europe > United Kingdom (0.04)

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Target Layer Regularization for Continual Learning Using Cramer-Wold Generator

Mazur, Marcin, Pustelnik, Łukasz, Knop, Szymon, Pagacz, Patryk, Spurek, Przemysław

arXiv.org Artificial IntelligenceNov-15-2021

The concept of continual learning (CL), which aims to reduce the distance between human and artificial intelligence, seems to be considered recently by deep learning community as one of the main challenges. Generally speaking, it means the ability of the neural network to effectively learn consecutive tasks (in either supervised or unsupervised scenarios) while trying to prevent forgetting already learned information. Therefore, when designing an appropriate strategy, it needs to be ensured that the network weights are updated in such a way that they correspond to both the current and all previous tasks. However, in practice, it is quite likely that constructed CL model will suffer from either intransigence (hard acquiring new knowledge, see Chaudhry et al. [2018]) or catastrophic forgetting (CF) phenomenon (tendency to lose past knowledge, see McCloskey and Cohen [1989]). In recent years, methods of overcoming the above-mentioned problems are subject to wide and intensive investigation.

learning, scenario, target layer regularization, (13 more...)

arXiv.org Artificial Intelligence

2111.07928

Country:

North America > United States (0.14)
Europe > Poland > Lesser Poland Province > Kraków (0.06)

Genre: Research Report (0.40)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback